Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Effectively Searching Maps in Web Documents

Identifieur interne : 000974 ( Main/Exploration ); précédent : 000973; suivant : 000975

Effectively Searching Maps in Web Documents

Auteurs : Qingzhao Tan [États-Unis] ; Prasenjit Mitra [États-Unis] ; Lee Giles [États-Unis]

Source :

RBID : ISTEX:67CD86A12B89EBCD53AC72D3C9B7E9B7BE5DA55F

Abstract

Abstract: Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.

Url:
DOI: 10.1007/978-3-642-00958-7_17


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Effectively Searching Maps in Web Documents</title>
<author>
<name sortKey="Tan, Qingzhao" sort="Tan, Qingzhao" uniqKey="Tan Q" first="Qingzhao" last="Tan">Qingzhao Tan</name>
</author>
<author>
<name sortKey="Mitra, Prasenjit" sort="Mitra, Prasenjit" uniqKey="Mitra P" first="Prasenjit" last="Mitra">Prasenjit Mitra</name>
</author>
<author>
<name sortKey="Giles, Lee" sort="Giles, Lee" uniqKey="Giles L" first="Lee" last="Giles">Lee Giles</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:67CD86A12B89EBCD53AC72D3C9B7E9B7BE5DA55F</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-00958-7_17</idno>
<idno type="url">https://api.istex.fr/document/67CD86A12B89EBCD53AC72D3C9B7E9B7BE5DA55F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000570</idno>
<idno type="wicri:Area/Istex/Curation">000562</idno>
<idno type="wicri:Area/Istex/Checkpoint">000496</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Tan Q:effectively:searching:maps</idno>
<idno type="wicri:Area/Main/Merge">000982</idno>
<idno type="wicri:Area/Main/Curation">000974</idno>
<idno type="wicri:Area/Main/Exploration">000974</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Effectively Searching Maps in Web Documents</title>
<author>
<name sortKey="Tan, Qingzhao" sort="Tan, Qingzhao" uniqKey="Tan Q" first="Qingzhao" last="Tan">Qingzhao Tan</name>
<affiliation wicri:level="4">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering, The Pennsylvania State University, PA 16802, University Park</wicri:regionArea>
<orgName type="university">Université d'État de Pennsylvanie</orgName>
<placeName>
<settlement type="city">University Park (Pennsylvanie)</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Mitra, Prasenjit" sort="Mitra, Prasenjit" uniqKey="Mitra P" first="Prasenjit" last="Mitra">Prasenjit Mitra</name>
<affiliation wicri:level="4">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering, The Pennsylvania State University, PA 16802, University Park</wicri:regionArea>
<orgName type="university">Université d'État de Pennsylvanie</orgName>
<placeName>
<settlement type="city">University Park (Pennsylvanie)</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Information Sciences and Technology, The Pennsylvania State University, PA 16802, University Park</wicri:regionArea>
<orgName type="university">Université d'État de Pennsylvanie</orgName>
<placeName>
<settlement type="city">University Park (Pennsylvanie)</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Giles, Lee" sort="Giles, Lee" uniqKey="Giles L" first="Lee" last="Giles">Lee Giles</name>
<affiliation wicri:level="4">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Computer Science and Engineering, The Pennsylvania State University, PA 16802, University Park</wicri:regionArea>
<orgName type="university">Université d'État de Pennsylvanie</orgName>
<placeName>
<settlement type="city">University Park (Pennsylvanie)</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Information Sciences and Technology, The Pennsylvania State University, PA 16802, University Park</wicri:regionArea>
<orgName type="university">Université d'État de Pennsylvanie</orgName>
<placeName>
<settlement type="city">University Park (Pennsylvanie)</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">67CD86A12B89EBCD53AC72D3C9B7E9B7BE5DA55F</idno>
<idno type="DOI">10.1007/978-3-642-00958-7_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
</region>
<settlement>
<li>University Park (Pennsylvanie)</li>
</settlement>
<orgName>
<li>Université d'État de Pennsylvanie</li>
</orgName>
</list>
<tree>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Tan, Qingzhao" sort="Tan, Qingzhao" uniqKey="Tan Q" first="Qingzhao" last="Tan">Qingzhao Tan</name>
</region>
<name sortKey="Giles, Lee" sort="Giles, Lee" uniqKey="Giles L" first="Lee" last="Giles">Lee Giles</name>
<name sortKey="Giles, Lee" sort="Giles, Lee" uniqKey="Giles L" first="Lee" last="Giles">Lee Giles</name>
<name sortKey="Giles, Lee" sort="Giles, Lee" uniqKey="Giles L" first="Lee" last="Giles">Lee Giles</name>
<name sortKey="Mitra, Prasenjit" sort="Mitra, Prasenjit" uniqKey="Mitra P" first="Prasenjit" last="Mitra">Prasenjit Mitra</name>
<name sortKey="Mitra, Prasenjit" sort="Mitra, Prasenjit" uniqKey="Mitra P" first="Prasenjit" last="Mitra">Prasenjit Mitra</name>
<name sortKey="Mitra, Prasenjit" sort="Mitra, Prasenjit" uniqKey="Mitra P" first="Prasenjit" last="Mitra">Prasenjit Mitra</name>
<name sortKey="Tan, Qingzhao" sort="Tan, Qingzhao" uniqKey="Tan Q" first="Qingzhao" last="Tan">Qingzhao Tan</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000974 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000974 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:67CD86A12B89EBCD53AC72D3C9B7E9B7BE5DA55F
   |texte=   Effectively Searching Maps in Web Documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024